Error Rate Analysis of Labeling by Crowdsourcing
نویسندگان
چکیده
Crowdsourcing label generation has been a crucial component for many real-world machine learning applications. In this paper, we provide finite-sample exponential bounds on the error rate (in probability and in expectation) of hyperplane binary labeling rules for the Dawid-Skene (and Symmetric DawidSkene ) crowdsourcing model. The bounds can be applied to analyze many commonly used prediction methods, including the majority voting, weighted majority voting and maximum a posteriori (MAP) rules. These bound results can be used to control the error rate and design better algorithms. In particular, under the Symmetric Dawid-Skene model we use simulation to demonstrate that the data-driven EM-MAP rule is a good approximation to the oracle MAP rule which approximately optimizes our upper bound on the mean error rate for any hyperplane binary labeling rule. Meanwhile, the average error rate of the EM-MAP rule is bounded well by the upper bound on the mean error rate of the oracle MAP rule in the simulation.
منابع مشابه
Theoretical Analysis and Efficient Algorithms for Crowdsourcing
Theoretical Analysis and Efficient Algorithms for Crowdsourcing by Hongwei Li Doctor of Philosophy in Statistics University of California, Berkeley Professor Bin Yu, Chair Crowdsourcing has become an effective and popular tool for human-powered computation to label large datasets. Since the workers can be unreliable, it is common in crowdsourcing to assign multiple workers to one task, and to a...
متن کاملError Rate Bounds in Crowdsourcing Models
Crowdsourcing is an effective tool for human-powered computation on many tasks challenging for computers. In this paper, we provide finite-sample exponential bounds on the error rate (in probability and in expectation) of hyperplane binary labeling rules under the Dawid-Skene crowdsourcing model. The bounds can be applied to analyze many common prediction methods, including the majority voting ...
متن کاملError Rate Bounds and Iterative Weighted Majority Voting for Crowdsourcing
Crowdsourcing has become an effective and popular tool for human-powered computation to label large datasets. Since the workers can be unreliable, it is common in crowdsourcing to assign multiple workers to one task, and to aggregate the labels in order to obtain results of high quality. In this paper, we provide finite-sample exponential bounds on the error rate (in probability and in expectat...
متن کاملPerform Three Data Mining Tasks with Crowdsourcing Process
For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...
متن کاملAnalysis of Minimax Error Rate for Crowdsourcing and Its Application to Worker Clustering Model
While crowdsourcing has become an important means to label data, crowdworkers are not always experts— sometimes they can even be adversarial. Therefore, there is great interest in estimating the ground truth from unreliable labels produced by crowdworkers. The Dawid and Skene (DS) model is one of the most well-known models in the study of crowdsourcing. Despite its practical popularity, theoret...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013